09. What's Next?
What's Next?
In this lesson, you've learned all about the REINFORCE algorithm, which was illustrated with a toy environment with a discrete action space. But it's also important to mention that REINFORCE can also be used to solve environments with continuous action spaces!
For an environment with a continuous action space, the corresponding policy network could have an output layer that parametrizes a continuous probability distribution.
For instance, assume the output layer returns the mean \mu and variance \sigma^2 of a normal distribution.

Probability density function corresponding to normal distribution (Source: Wikipedia)
Then in order to select an action, the agent needs only to pass the most recent state s_t as input to the network, and then use the output mean \mu and variance \sigma^2 to sample from the distribution a_t\sim\mathcal{N}(\mu, \sigma^2).
This should work in theory, but it's unlikely to perform well in practice! To improve performance with continuous action spaces, we'll have to make some small modifications to the REINFORCE algorithm, and you'll learn more about these modifications in the upcoming lessons.